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Abstract. Many algorithms use concrete data types with some addi- 
tional invariants. The set of values satisfying the invariants is often a set 
of representatives for the equivalence classes of some equational theory. 
For instance, a sorted list is a particular representative wrt commuta- 
tivity. Theories like associativity, neutral element, idempotence, etc. are 
also very common. Now, when one wants to combine various invariants, 
it may be difficult to find the suitable representatives and to efficiently 
implement the invariants. The preservation of invariants throughout the 
whole program is even more difficult and error prone. Classically, the 
programmer solves this problem using a combination of two techniques: 
the definition of appropriate construction functions for the representa- 
tives and the consistent usage of these functions ensured via compiler 
verifications. The common way of ensuring consistency is to use an ab- 
stract data type for the representatives; unfortunately, pattern matching 
on representatives is lost. A more appealing alternative is to define a 
concrete data type with private constructors so that both compiler ver- 
ification and pattern matching on representatives are granted. In this 
paper, we detail the notion of private data type and study the existence 
of construction functions. We also describe a prototype, called Moca, 
that addresses the entire problem of defining concrete data types with 
invariants: it generates efficient construction functions for the combina- 
tion of common invariants and builds representatives that belong to a 
concrete data type with private constructors. 



1 Introduction 

Many algorithms use data types with some additional invariants. Every function 
creating a new value from old ones must be defined so that the newly created 
value satisfy the invariants whenever the old ones so do. 

One way to easily maintain invariants is to use abstract data types (ADT): 
the implementation of an ADT is hidden and construction and observation func- 
tions are provided. A value of an ADT can only be obtained by recursively using 
the construction functions. Hence, an invariant can be ensured by using appropri- 
ate construction functions. Unfortunately, abstract data types preclude pattern 
matching, a very useful feature of modern programming languages [10,11,16, 
15]. There have been various attempts to combine both features in some way. 



In [23], P. Wadler proposed the mechanisms of views. A view on an ADT 
a is given by providing a concrete data type (CDT) 7 and two functions in : 
a — > 7 and out : 7 — > a such that in o out = irf 7 and out o in = id a . Then, 
a function on a can be defined by matching on 7 (by implicitly using in) and 
the values of type 7 obtained by matching can be injected back into a (by 
implicitly using out). However, by leaving the applications of in and out implicit, 
we can easily get inconsistencies whenever in and out are not inverses of each 
other. Since it may be difficult to satisfy this condition (consider for instance the 
translations between cartesian and polar coordinates), these views have never 
been implemented. Following the suggestion of W. Burton and R. Cameron 
to use the in function only [3], some propositions have been made for various 
programming languages but none has been implemented yet [4, 17]. 

In [3], W. Burton and R. Cameron proposed another very interesting idea 
which seems to have attracted very little attention. An ADT must provide con- 
struction and observation functions. When an ADT is implemented by a CDT, 
they propose to also export the constructors of the CDT but only for using 
them as patterns in pattern matching clauses. Hence, the constructors of the 
underlying CDT can be used for pattern matching but not for building values: 
only the construction functions can be used for that purpose. Therefore, one can 
both ensure some invariants and offer pattern matching. These types have been 
introduced in OCaml by the third author [24] under the name of concrete data 
type with private constructors, or private data type (PDT) for short. 

Now, many invariants on concrete data types can be related to some equa- 
tional theory. Take for instance the type of list with the constructors [] and ::. 
Given some elements v\..v n , the sorted list which elements are v\..v n is a partic- 
ular representative of the equivalence class of v\ ::..::«„::[] modulo the equation 
x::y::l=y::x::l. Requiring that, in addition, the list does not contain the same 
element twice is a particular representative modulo the equation x::x::l=x::l. 

Consider now the type of join lists with the constructors empty, singleton and 
append, for which concatenation is of constant complexity. Sorting corresponds 
to associativity and commutativity of append. Requiring that no argument of 
append is empty corresponds to neutrality of empty wrt append. We have a 
structure of commutative monoid. 

More generally, given some equational theory on a concrete data type, one 
may wonder whether there exists a representative for each equivalence class and, 
if so, whether a representative of C{t\ . . . t n ) can be efficiently computed knowing 
that t\ . . .t n are themselves representatives. 

In [21,22], S. Thompson describes a mechanism introduced in the Miranda 
functional programming language for implementing such non-free concrete data 
types without precluding pattern matching. The idea is to provide conditional 
rewrite rules, called laws, that are implicitly applied as long as possible on every 
newly created value. This can also be achieved by using a PDT which construc- 
tion functions (primed constructors in [21]) apply as long as possible each of 
the laws. Then, S. Thompson studies how to prove the correctness of functions 
defined by pattern matching on such lawful types. However, few hints are given 



on how to check whether the laws indeed implement the invariants one has in 
mind. For this reason and because reasoning on lawful types is difficult, the law 
mechanism was removed from Miranda. 

In this paper, we propose to specify the invariants by unoriented equations 
(instead of rules). We will call such a type a relational data type (RDT). Sec- 
tions 2 and 3 introduce private and relational data types. Then, we study when 
an RDT can be implemented by a PDT, that is, when there exist construction 
functions computing some representative for each equivalence class. Section 4 
provides some general existence theorem based on rewriting theory. But rewrit- 
ing may be inefficient. Section 5 provides, for some common equational theories, 
construction functions more efficient than the ones based on rewriting. Section 
6 presents Moca, an extension of OCaml with relational data types whose con- 
struction functions are automatically generated. Finally, Section 7 discusses some 
possible extensions. 

2 Concrete data types with private constructors 

We first recall the definition of a first-order term algebra. It will be useful for 
defining the values of concrete and private data types. 

Definition 1 (First-order term algebra) A sorted term algebra definition is 
a triplet A = {S,C, S) where S is a non-empty set of sorts, C is a non-empty 
set of constructor symbols and £ : C —* S + is a signature mapping a non-empty 
sequence of sorts to every constructor symbol. We write C : o\ . . . a n a n+1 6 S 
to denote the fact that S(C) — a\ . . .a n a n +i- Let X = (X a ) ae s be a family 
of pairwise disjoint sets of variables. The sets T a (A,X) of terms of sort a are 
inductively defined as follows: 

- If x e X„, then x 6 %{A, X). 

- If C : <7i ...a n +i € S and U e % z (A,X), then C(ti,...,t„) £ T an+1 (A,X). 
Let T a (A) be the set of terms of sort a containing no variable. 

In the following, we assume given a set So of primitive types like int, string, 
. . . and a set Co of primitive constants 0, 1, "f oo", . . . Let E be the corresponding 
signature (^o(O) = int, . . . ). 

In this paper, we call concrete data type (CDT) an inductive type a la ML 
defined by a set of constructors. More formally: 

Definition 2 (Concrete data type) A concrete data type definition is a triplet 
r = (7, C, S) where 7 is a sort, C is a non-empty set of constructor symbols and 
S : C — > (<So U {7}) + is a signature such that, for all CeC, 5J{C) — o\..o n "i- 
The set Val(^) of values of type 7 is the set of terms T 7 (Ar) where Ar = 
(SoU{7},C UC,£oU£). 

This definition of CDTs corresponds to a small but very useful subset of all 
the possible types definable in ML-like programming languages. For the purpose 
of this paper, it is not necessary to use a more complex definition. 



Example 1 The following type 4 cexp is a CDT definition with two constant 
constructors of sort cexp and a binary operator of sort cexp cexp cexp. 

type cexp = Zero I One I Opp of cexp I Plus of cexp * cexp 

Now, a private data type definition is like a CDT definition together with 
construction functions as in abstract data types. Constructors can be used as 
patterns as in concrete data types but they cannot be used for value creation 
(except in the definition of construction functions). For building values, one must 
use construction functions as in abstract data types. Formally: 

Definition 3 (Private data type) A private data type definition is a pair 
II = {r,T) where P = (tt,C, E) is a CDT definition and T is & family of con- 
struction functions (fc)ceC such that, for all C : a\..a n 'K G E, fc ■ %r 1 (Ar) x 
. . . x % n (Ar) — * T n (Ar)- Let Val(n) be the set of the values of type n, that 
is, the set of terms that one can build by using the construction functions only. 
The function / : % I (Ar) — > T^(Ar) such that, for all C : <ri..a n ir G E and 
t{ G T ai {Ar), f{C{t\..t n )) — fc(f(ti)--f(tn)), is called the normalization func- 
tion associated to T . 

This is quite immediate to see that: 

Lemma 1. Val(ir) is the image of f. 

PDTs have been implemented in OCaml by the third author [24]. Extending a 
programming language with PDTs is not very difficult: one only needs to modify 
the compiler to parse the PDT definitions and check that the conditions on the 
use of constructors are fulfilled. 

Note that construction functions have no constraint in general: the full power 
of the underlying programming language is available to define them. 

It should also be noted that, because the set of values of type ir is a subset 
of the set of values of the underlying CDT 7, a function on 7r defined by pattern 
matching may be a total function even though it is not defined on all the possible 
cases of 7. Defining a function with patterns that match no value of type tt does 
not harm since the corresponding code will never be run. It however reveals that 
the developer is not aware of the distinction between the values of the PDT and 
those of the underlying CDT, and thus can be considered as a programming 
error. To avoid this kind of errors, it is important that a PDT comes with a 
clear identification of its set of possible values. To go one step further, one could 
provide a tool for checking the completeness and usefulness of patterns that takes 
into account the invariants, when it is possible. We leave this for future work. 

Example 2 Let us now start our running example with the type exp describing 
operations on arithmetic expressions. 

4 Examples are written with OCaml [10], they can be readily translated in any pro- 
gramming language offering pattern-matching with textual priority, as Haskell, SML, 
etc. 



type exp = private Zero I One I Opp of exp I Plus of exp * exp 

This type exp is indeed a PDT built upon the CDT cexp. Prompted by the 
keyword private, the OCaml compiler forbids the use of exp constructors (out- 
side the module my_exp.ml containing the definition of exp) except in patterns. 
If Zero is supposed to be neutral by the writer of my_exp.ml, then he/she will 
provide construction functions as follows: 

let rec zero = Zero and one = One and opp x = Opp x 

and plus = function 

I (Zero.y) -> y 

I (y.Zero) -> y 

I (x,y) -> Plus(x.y) 

3 Relational data types 

We mentioned in the introduction that, often, the invariants upon concrete data 
types are such that the set of values satisfying them is indeed a set of representa- 
tives for the equivalence classes of some equational theory. We therefore propose 
to specify invariants by a set of unoriented equations and study to which extent 
such a specification can be realized with an abstract or private data type. In 
case of a private data type however, it is important to be able to describe the 
set of possible values. 

Definition 4 (Relational data type) A relational data type (RDT) defini- 
tion is a pair (r, £) where r = (tt 7 C 7 S) is a CDT definition and £ is a finite 
set of equations on T v (Ar, X). Let =s be the smallest congruence relation con- 
taining £. Such an RDT is implementable by a PDT (r, T) if the family of 
construction functions T = (fc)cec is valid wrt £: 

(Correctness) For all C : a\..a n -K and Vi £ Val{<Ji), fc{vi--v n ) =g C{vi..v n ). 
(Completeness) For all C : a\..a n o, Vi £ Val(ai), D : t\..t p o G S and 
Wi e Val(n), fc(vi..v n ) = fD(wi..w p ) whenever C{v\..v n ) =s D(w 1 ..w p ). 

We are going to see that the existence of a valid family of construction func- 
tions is equivalent to the existence of a valid normalization function: 

Definition 5 (Valid normalization function) A map / : T w (Ar) — * %r(Ar) 
is a valid normalization function for an RDT (r, £) with r = (ir,C,E) if: 

(Correctness) For all t e %(Ar), f(t) =e t. 

(Completeness) For all t, u £ T w (Ar), f(t) = f(u) whenever t =s it- 
Note that a valid normalization function is idempotent (/ o / = /) and 
provides a decision procedure for =s (the boolean function Xxy.f(x) = f(y)). 

Theorem 6 The normalization function associated to a valid family is a valid 
normalization function. 



Proof. 

- Correctness. We proceed by induction on the size of t G %. We have C : 
a\..a n -K G S and U such that t = C{t\..t n ). By definition, f(t) = fc(f(ti)-- 
f{t n )). By induction hypothesis, f(U) =£ ti. Since the family is valid and 
tth)..f(t n ) are values, fc(f(h)..f(t n )) =e C(f(h)..f(t n )). Thus, f(t) = £ t. 

- Completeness. Let t,u sT, such that t =s u. We have t — C{ti..t n ) and u = 
D(«i..up). By definition, f(t) = fc(f(h)..f(t n )) and /(«) = f D (f(ui)..f(u p )). 
By correctness, /(ti) =£ ij and /(%-) = f Uj. Hence, C(f(ti)..f(t n )) = £ 
D(f(ui)..f(u p )). Since the family is valid and f(ti)--f(t n ) are values, fc(f(ti) 

-f(tn)) = fD(f(h)..f(t n )). Thus, /(t) = /(«). ■ 

Conversely, given / : T w (Ar) — ► ^r(^4r), one can easily define a family of 
construction functions that is valid whenever / is a valid normalization function. 

Definition 7 (Associated family of constr. functions) Given a CDT _T = 
(tt,C, E) and a function / : T n (Ar) — ► %r{Ar), the family of construction func- 
tions associated to f is the family (fc)ceC such that, for all C : a\..a. n -K G E 
and U eT ai (A r ), fc(h, ■ ■ ■ ,t n ) = f(C(h, . . . ,*„)). 

Theorem 8 The family of construction functions associated to a valid normal- 
ization function is valid. 

Example 3 We can choose cexp as the underlying CDT and £ = { Plus x 
Zero = x} to define a RDT implementable by the PDT exp, with the valid 
family of construction functions zero, one, opp, plus. 

4 On the existence of construction functions 

In this section, we provide a general theorem for the existence of valid families 
of construction functions based on rewriting theory. We recall the notions of 
rewriting and completion. The interested reader may find more details in [8]. 

Standard rewriting. A rewrite rule is an ordered pair of terms (l,r) written 
I — > r. A rule is left-linear if no variable occurs twice in its left hand side /. 

As usual, the set Pos(t) of positions in t is defined as a set of words on positive 
integers. Given p G Pos(t), let t\ p be the subterm of t at position p and t[u] p be 
the term t with t\ p replaced by u. 

Given a finite set 1Z of rewrite rules, the rewriting relation is defined as 
follows: t -^iz u iff there are p G Pos(t), I — > r G 1Z and a substitution 9 such 
that t\ p — 19 and u = t[r6] p . A term t is an IZ-normal form if there is no u such 
that t —^-ji u. Let =n be the symmetric, reflexive and transitive closure of —*n- 

A reduction ordering >- is a well-founded ordering (there is no infinitely de- 
creasing sequence to >- ti >- ■ ■ •) stable by context (C(..t..) >- C(..u..) whenever 
t >- u) and substitution (t9 >- u9 whenever t >- u). If 1Z is included in a reduction 
ordering, then -^n is well-founded (terminating, strongly normalizing). 



We say that is confluent if, for all terms t, u, v such that u t — ^ v, 
there exists a term w such that u w v. This means that the relation 
^n^n 1S included in the relation -^k^iz (composition of relations is written 
by juxtaposition). 

If -^-jz is confluent, then every term has at most one normal form. If — is 
well-founded, then every term has at least one normal form. Therefore, if — >k is 
confluent and terminating, then every term has a unique normal form. 

Standard completion. Given a finite set £ of equations and a reduction or- 
dering >-, the standard Knuth-Bendix completion procedure [2] tries to find a 
finite set 7Z of rewrite rules such that: 

• 1Z is included in >-, 

• — >tz is confluent, 

• 1Z and £ have same theory: =s = =n- 

Note that completion may fail or not terminate but, in case of successful 
termination, ^-normalization provides a decision procedure for =£ since t =s u 
iff the 7?.-normal forms of t and u are syntactically equal. 

However, since permutation theories like commutativity or associativity and 
commutativity together (written AC for short) are included in no reduction 
ordering, dealing with them requires to consider rewriting with pattern matching 
modulo these theories and completion modulo these theories. In this paper, we 
restrict our attention to AC. 

Definition 9 (Associative-commutative equations) Let Com be the set of 

commutative constructors, i.e. the set of constructors C such that £ contains an 
equation of the form C(x, y) — C(y, x). Then, let £ac be the subset of £ made of 
the commutativity and associativity equations for the commutative constructors, 
=ac be the smallest congruence relation containing £ac and £^ac = £ \ £ac- 

Rewriting modulo AC. Given a set 1Z of rewrite rules, rewriting with pattern 
matching modulo AC is defined as follows: t -^n,AC u iff there are p £ Pos(t), 
I^reK and a substitution 9 such that t\ p —ac Id and u = t[r6] p . A reduction 
ordering >- is AC -compatible if, for all terms t,t',u,u' such that t —ac f and 
u =ac u ' , t' >~ v! iff t >~ u. The relation — >n.AC is confluent modulo AC if 
(^u,ac~ ac^k,ac) - (~ *n,AC = AC^~ n,Ac)- 

Completion modulo AC. Given a finite set £ of equations and an AC- 
compatible reduction ordering >-, completion modulo AC [18] tries to find a 
finite set 1Z of rules such that: 

• 1Z is included in >-, 

• -^tz,ac is confluent modulo AC, 

• £ and 1Z U £ac have same theory: =s = =izu£ A c • 

Definition 10 A theory £ has a complete presentation if there is an AC-com- 
patible reduction ordering for which the ^4C-completion of £^ac successfully 
terminates. 



Many interesting systems have a complete presentation: (commutative) mo- 
noids, (abelian) groups, rings, etc. See [13, 5] for a catalog. Moreover, there are 
automated tools implementing completion modulo AC. See for instance [6, 12]. 

A term may have distinct 1Z, AC-normal forms but, by confluence modulo 
AC, all normal forms are ^4C-equivalent and one can easily define a notion of 
normal form for AC-equivalent terms [13]: 

Definition 11 (^4C-normal form) Given an associative and commutative con- 
structor C, C -left- combs (resp. C -right- combs) and their leaves are inductively 
defined as follows: 

- If t is not headed by C, then t is both a C-left-comb and a C-right-comb. The 
leaves of t is the one-element list leaves (t) — [t] . 

- litis not headed by C and u is a C-right-comb, then C(t, u) is a C-right-comb. 
The leaves of C(t,u) is the list t :: leaves{u). 

- If t is not headed by C and u is a C-left-comb, then C(u,t) is a C-left-comb. 
The leaves of C(u,t) is the list leaves (u)@[t], where @ is the concatenation. 

Let orient be a function associating a kind of combs (left or right) to every AC- 
constructor. Let < be a total ordering on terms. Then, a term t is in AC -normal 
form wrt orient and < if: 

- Every subterm of t headed by an AC-constructor C is an orient (C)-comb 
whose leaves are in increasing order wrt <■ 

- For every subterm of t of the form C(u,v) with C commutative but non- 
associative, we have u < v. 

As it is well-known, one can put any term in AC-normal form: 

Theorem 12 Whatever the function orient and the ordering < are, every term 
t has an AC-normal form t [ac wrt orient and <, and t =ac ^Iac- 

Proof. Let A be the set of rules obtained by choosing an orientation for the 
associativity equations of Sac according to orient: 

- If orient{C) is "left", then take C(x,C(y,z)) -> C(C(x,y),z). 

- If orient{C) is "right", then take C(C(x,y),z) -> C(x,C(y,z)). 

—*A is a confluent and terminating relation putting every subterm headed by 
an AC-constructor into a comb form according to orient. Let comb be a function 
computing the ,4-normal form of a term. Let now sort be a function permuting 
the leaves of combs and the arguments of commutative but non-associative con- 
structors to put them in increasing order wrt <. Then, the function sort o comb 
computes the ^4C-normal form of any term and sort(comb(t)) ~ac t. ■ 

This naturally provides a decision procedure for AC-equivalence: the func- 
tion Xxy .sort (comb (x)) = sort (comb (y)) . It follows that 1Z, j4C-normalization 
together with AC-normalization provides a valid normalization function, hence 
the existence of a valid family of construction functions: 



Theorem 13 If £ has a complete presentation, then there exists a valid family 
of construction functions. 

Proof. Assume that £ has a complete presentation TZ. We define the com- 
putation of normal forms as it is generally implemented in rewriting tools. Let 
step be a function making an TZ, AC-rewrite step if there is one, or failing if the 
term is in normal form. Let norm be the function applying step until a normal 
form is reached. Since TZ is a complete presentation of £, by definition of the 
completion procedure, sort o combo norm is a valid normalization function. Thus, 
by Theorem 8, the associated family of construction functions is valid. ■ 

The construction functions described in the proof are not very efficient since 
they are based on rewriting with pattern matching modulo AC, which is NP- 
complete [1], and do not take advantage of the fact that, by definition of PDTs, 
they are only applied to terms already in normal form. We can therefore wonder 
whether they can be defined in a more efficient way for some common equational 
theories like the ones of Figure 1. 



Fig. 1. Some common equations on binary constructors 



Name 


Abbrev 


Definition 


Example 


associativity 


Assoc{C) 


C(C(x,y),z) = C{x,C{y,z)) 


(x + y) + z = x + (y + z) 


commutativity 


Com(C) 


C(x,y) = C(y,x) 


x + y = y + x 


neutrality 


Neu(C, E) 


C{x,E) = x 


x + = x 


inverse 


Inv(C,I,E) 


C(x,I{x)) = E 


x + (-x) = 


idempotence 


Idem{C) 


C(x, x) — x 


x A x = x 


nilpotence 


Nil(C,A) 


C(x,x) = A 


x © x — _L (exclusive or) 



Rewriting provides also a way to check the validity of construction functions: 

Theorem 14 If £ has a complete presentation 1Z and T = (fc)cec is a family 
such that, for all C : ai..o- n ir £ E and terms Vi £ Val(ai), fc( v i-- v n) is an 
1Z, AC-normal form of C(vi..v n ) in ^4C-normal form, then T is valid. 

Proof. 

- Correctness. Let C : a\..a n ^ £ S and Vi £ Val{ai). Since fc{vi..v n ) is an 
1Z, j4C-normal form of C{v\..v n ), we clearly have fc{v\--v n ) =s C{v\..v n ). 

- Completeness. Let C : o~\..o n -K £ S, Vi <E Valjr(pi\ D : t\..t. p -k £ S, and 
Wi £ Valjr(Ti) such that C(vi..v n ) =£ D(wi..w p ). Since 7Z is a complete pre- 
sentation of £, norm{C(v\..v n )) =ac norm(D(wi..w p )). Thus, fc{vi--v n ) = 
f D (wi..w p ). ■ 

It follows that rewriting provides a natural way to explain what are the 
possible values of an RDT: values are ^4C-normal forms matching no left hand 
side of a rule of 1Z. 



5 Towards efficient construction functions 



When there is no commutative symbol, construction functions can be easily 
implemented by simulating innermost rewriting as follows: 

Definition 15 (Linearization) Let VPos(t) be the set of positions p G Pos(t) 
such that t\ p is a variable x G X. Let p : VPos(i) — ► X be an injective mapping 
and lin(t) be the term obtained by replacing in t every subterm at position 
p G VPos(i) by p(p). Let now Eq(t) be the conjunction of true and of the 
equations p(p) = p(q) such that t\ p = t\ g and p, q G VPos(i). 

Definition 16 Given a set 1Z of rewrite rules, let !F{TV) be the family of con- 
struction functions (fc)ceC defined as follows: 

• For every rule I — > r G 1Z with I = C{1\, . . . ,l n ), add to the definition of 
fc the clause lin(l\), . . . ,lin(l n ) when Eq(l) -> lin(r), where t is the term 
obtained by replacing in t every occurrence of a constructor C by a call to its 
construction function fc- 

• Terminate the definition of fc by the default clause x -> C(x). 

Theorem 17 Assume that Sac = and £ has a complete presentation 1Z. 
Then, FilZ) is valid wrt £ (whatever the order of the non-default clauses is). 

We now consider the case of commutative symbols. We are going to describe 
a modular way of defining the construction functions by pursuing our running 
example, with the type exp. Assume that Plus is declared to be associative and 
commutative only. The construction functions can then be defined as follows: 

let zero = Zero and one = One and opp x = Opp x 

and plus = function 

I Plus(x,y), z -> plus (x, plus (y,z)) 
I x, y -> insert _plus x y 

and insert_plus x = function 

I Plus(y,_) as u when x <= y -> Plus(x,u) 

I Plus(y,t) -> Plus (y, insert_plus x t) 

I u when x > u -> Plus(u,x) 

I u -> Plus(x,u) 

One can easily see that plus does the same job as the function sort o comb 
used in Theorem 12 but in a slightly more efficient way since ^-normalization 
and sorting are interleaved. 

Assume moreover that Zero is neutral. The AC-completion of { Plus(Zero, x) 
= x} gives { Plus(Zero, x) — > x) . Hence, if x and y are terms in normal form, 
then Plus(x,y) can be rewritten modulo AC only if x = Zero or y = Zero. 
Thus, the function plus needs to be extended with two new clauses only: 



and plus = function 
I Zero, y -> y 
I x, Zero -> x 

I Plus(x,y), z -> plus (x, plus (y,z)) 
I x, y -> insert_plus x y 

Assume now that Plus is declared to have Opp as inverse. Then, the com- 
pletion modulo AC of { Plus(Zero, x) = x, Plus(0pp(a;), x) = Zero} gives 
the following well known rules for abelian groups [13]: { Plus(Zero, x) — ► x, 
Plus(0pp(x), x) — ► Zero, Plus(Plus(Opp(x), x), y) — > y, Opp(Zero) -n>Zero, 
0pp(0pp(x)) -> x, Opp(Plus(x, y)) -» Plus(0pp(2/),0pp(x)) }. 

The rules for Opp are easily translated as follows: 

and opp = function 
I Zero -> Zero 
I Opp(x) -> x 

I Plus(x,y) -> plus (opp y, opp x) 
I _ -> Opp(x) 

The third rule of abelian groups is called an extension of the second one 
since it is obtained by first adding the context Plus(\\,y) on both sides of this 
second rule, then normalizing the right hand side. Take now two terms x and y in 
normal form and assume that (x, y) matches none of the three clauses previously 
defining plus, that is, x and y are distinct from Zero, and x is not of the form 
Plus(xi, £2). To get the normal form of Plus(x, y), we need to check that x and 
the normal form of its opposite Opp(x) do not occur in y. The last clause defining 
plus needs therefore to be modified as follows: 

and plus = function 
I Zero, y -> y 
I x, Zero -> x 

I Plus(x,y), z -> plus (x, plus (y,z)) 
I x, y -> insert_opp_plus (opp x) y 

and insert_opp_plus x y = 
try delete_plus x y 

with Not_found -> insert_plus (opp x) y 

and delete_plus x = function 

I Plus(y,_) when x < y -> raise Not_found 

I Plus(y,t) when x = y -> t 

I Plus(y,t) -> Plus (y, delete_plus x t) 

I y when y = x -> Zero 

I _ -> raise Not_found 

Forgetting about Zero and Opp, suppose now that Plus is declared associa- 
tive, commutative and idempotent. The function plus is kept but the insert 
function is modified as follows: 



and insert_plus x = function 

I Plus(y,_) as u when x = y -> u 

I Plus(y,_) as u when x < y -> Plus(x,u) 

I Plus(y,t) -> Plus (y,insert_plus x t) 

I u when x > u -> Plus(u,x) 

I u when x = u -> u 

I u -> Plus(x,u) 

Nilpotence can be dealt with in a similar way. 

In conclusion, for various combinations of the equations of Figure 1, we can 
define in a nice modular way construction functions that are more efficient than 
the ones based on rewriting modulo AC. We summarize this as follows: 

Definition 18 A set of equations £ is a theory of type: 

(1) if £ac = and £ has a complete presentation, 

(2) if £ is the union of {Assoc(C),Com(C)} with either {Neu(C, E),Inv(C, I, E)}, 
{Idem(C)}, {Neu(C, E),Idem(C)} {Nil(C, A)} or {Neu(C, E), Nil(C, A)}. 

Two theories are disjoint if they share no symbol. 

Let us give schemes for construction functions for theories of type 2. A clause 
is generated only if the conditions Neu(C,E), Inv(C,I,E), etc. are satisfied. 
These conditions are not part of the generated code. 

let f_C = function 

I E, x when Neu(C.E) -> x 

I x, E when Weu(C.E) -> x 

I C(x,y), z when Assoc(C) -> f _C(x,f _C(y ,z) ) 

I x, y when Inv(C,I,E) -> insert_inv_C (f_I x) y 

I x, y -> insert_C x y 

and f_I = function 
I E -> E 
I Id) -> x 

I C(x,y) -> f_C(f_I y, f_I x) 
I x -> I x 

and insert_inv_C x y = 
try delete_C x y 

with Not_found -> insert_C (f_I x) y 

and delete_C x = function 

I Plus(y,_) when x < y -> raise Not_found 

I Plus(y,t) when x = y -> t 

I Plus(y.t) -> C(y, delete_C x t) 

I y when y = x -> E 

I _ -> raise Not_found 



and insert_C x = function 

I C(y,_) as u when x = y & idem -> u 

I C(y,t) when x = y & nil -> f_C(A,t) 

I C(y,_) as u when x <= y & com -> C(x,u) 

I C(y,t) when Com(C) -> C(y, insert_C x t) 

I u when x > u & Com(C) -> C(u,x) 

I u when x = u & Idem(C) -> u 

I u when x = u & Wil(C,A) -> A 

I u -> C(x,u) 

Theorem 19 Let £ be the union of pairwise disjoint theories of type 1 or 2. 
Assume that, for all constructor C which theory is of type k, fc is defined as in 
Definition 16 if k = 1, and as above if k = 2. Then, (fc)cec is valid wrt £. 

Proof. Assume that E = U™=i ^ where £\, . . . , £ n are pairwise disjoint the- 
ories of type 1 or 2. Whatever the type of £i is, we saw that £i has a complete 
presentation TZi. Therefore, since £\, . . . ,£ n share no symbol, by definition of 
completion, the ^IC-completion of £ successfully terminates with 1Z = U"=i 
Thus, —>k,ac is terminating and AC-confluent. Since T — (fc)cec computes 
7Z, AC-normal forms in ^4C-normal forms, by Theorem 14, T is valid. ■ 

The construction functions of type 2 can be easily extended to deal with ring 
or lattice structures (distributivity and absorbance equations). 

More general results can be expected by using or extending results on the 
modularity of completeness for the combination of rewrite systems. The com- 
pleteness of hierarchical combinations of non-^4C-rewrite systems is studied in 
[19]. Note however that the modularity of confluence for AC-rewrite systems has 
been formally established only recently in [14]. 

Note that the construction function definitions of type 1 or 2 provide the 
same results with call-by-value, call-by-name or lazy evaluation strategy. 

The detailed study of the complexity of theses definitions (compared to AC- 
rewriting) is left for future work. 

6 The Moca system 

We now describe the Moca prototype, a program generator that implements an 
extension of OCaml with RDTs. Moca parses a special ".mini" file containing the 
RDT definition and produces a regular OCaml module (interface and implemen- 
tation) which provides the construction functions for the RDT. Moca provides 
a set of keywords for specifying the equations described in Figure 1. 
For instance, the RDT exp can be defined in Moca as follows: 

type exp = private Zero I One I Opp of exp I Plus of exp * exp 
begin associative commutative neutral (Zero) opposite (Opp) end 

Moca also features user's arbitrary rules with the construction: rule pattern 
-> pattern. These rules add extra clauses in the definitions of construction func- 
tions generated by Moca: the LHS pattern is copied verbatim as the pattern of 



a clause which returns the RHS pattern considered as an expression where con- 
structors are replaced by calls to the corresponding construction functions. Of 
course, in the presence of such arbitrary rules, we cannot guarantee the termina- 
tion or completeness of the generated code. This construction is thus provided 
for expert users that can prove termination and completeness of the correspond- 
ing set of rules. That way, the programmer can describe complex RDTs, even 
those which cannot be described with the set of predefined equational invariants. 

Moca also accepts polymorphic RDTs and RDTs mutually defined with record 
types (but equations between record fields are not yet available). 

The equations of Figure 1 also support n-ary constructor, implemented as 
unary constructors of type t list -> t. In this case, Plus gets a single argu- 
ment of type exp list. Normal forms are modified accordingly and use lists 
instead of combs. For instance, associative normal forms get flat lists of argu- 
ments: in a Plus(7) expression, no element of I is a Plus(Z') expression. The 
corresponding data structure is widely used in rewriting. 

Finally, Moca offers an important additional feature: it can generate construc- 
tion functions that provide maximally shared representatives. To fire maximal 
sharing, just add the -sharing option when compiling the ".mini" file. In this 
case, the generated type is slightly modified, since every functional constructor 
gets an extra argument to keep the hash code of the term. Maximally shared rep- 
resentatives have a lot of good properties: not only data size is minimal and user's 
memoized functions can be light speed, but comparison between representatives 
is turned from a complex recursive term comparison to a pointer comparison - 
a single machine instruction. Moca heavily uses this property for the generation 
of construction functions: when dealing with non-linear equations, the maximal 
sharing property allows Moca to replace term equality by pointer equality. 

7 Future work 

We plan to integrate Moca to the development environment Focal [20]. Focal 
units contain declarations and definitions of functions, statements and proofs 
as first-class citizens. Their compilation produces both a file checkable by the 
theorem prover Coq [7] and a OCaml source code. Proofs are done either within 
Coq or via the automatic theorem prover Zenon [9], which issues a Coq file when 
it successes. Every Focal unit has a special field, giving the type of the data ma- 
nipulated in this unit. Thus, it would be very interesting to do a full integration 
of private/relational data types in Focal, the proof of correctness of construction 
functions being done with Zenon or Coq and then recorded as a theorem to be 
used for further proofs. This should be completed by the integration of a tool 
on rewriting and equational theories able to complete equational presentations, 
to generate and prove the corresponding lemmas and to show some termination 
properties. Some experiments already done within Focal on coupling CiME [6] 
and Zenon give a serious hope of success. 

Acknowledgments. The authors thank Claude Kirchner for his comments 
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References 



1. D. Benanav, D. Kapur, and P. Narendran. Complexity of matching problems. J. 
of Symbolic Computation, 3(l-2):203-216, 1987. 

2. P. Bendix and D. Knuth. Computational problems in abstract algebra, chapter 
Simple word problems in universal algebra. Pergamon Press, 1970. 

3. F. Burton and R. Cameron. Pattern matching with abstract data types. J. of 
Functional Programming, 3(2):171-190, 1993. 

4. W. Burton, E. Meijer, P. Sansom, S. Thompson, and P. Wadler. Views: An exten- 
sion to Haskell pattern matching, http://www.haskell.org/extensions/views. 
html, 1996. 

5. P. Le Chenadec. Canonical forms in finitely presented algebras. Research notes in 
theoretical computer science. Pitman, 1986. 

6. E. Contejean, C. Marche, B. Monate, and X. Urbain. CiME version 2.02. LRI, 
CNRS UMR 8623, Universite Paris-Sud, France, 2004. http://cime.lri.fr/. 

7. Coq Development Team. The Coq Proof Assistant Reference Manual, Version 8.0. 
INRIA, France, 2006. http://coq.inria.fr/. 

8. N. Dershowitz and J. -P. Jouannaud. Rewrite systems. In J. van Leeuwen, editor, 
Handbook of Theoretical Computer Science, volume B, chapter 6. North Holland, 
1990. 

9. D. Doligez. Zenon, version 0.4.1. http://focal.inria.fr/zenon/, 2006. 

10. D. Doligez, J. Garrigue, X. Leroy, D. Remy, and J. Vouillon. The Objective Caml 
system release 3.09, Documentation and user's manual. INRIA, France, 2005. 
http : //caml . inria.fr/. 

11. S. P. Jones (editor). Haskell 98 Language and Libraries, The revised report. Cam- 
bridge University Press, 2003. 

12. J.-M. Gaillourdet, T. Hillenbrand, B. Lochner, and H. Spies. The new Waldmeister 
loop at work. In Proc. of CADE '03, LNCS 2741. http://www.waldmeister.org/. 

13. J.-M. Hullot. Compilation de formes canoniques dans les theories equationnelles. 
PhD thesis, Universite Paris 11, France, 1980. 

14. J.-P. Jouannaud. Modular church-rosser modulo. In Proc. of RTA'06, LNCS 4098. 

15. P.-E. Moreau, E. Balland, P. Brauner, R. Kopetz, and A. Reilles. Tom Manual 
version 2.3. INRIA & LORIA, Nancy, France, 2006. http://tom.loria.fr/. 

16. P.-E. Moreau, C. Ringeissen, and M. Vittek. A pattern matching compiler for 
multiple target languages. In Proc. of CC'03, LNCS 2622. 

17. C. Okasaki. Views for standard ML. In Proc. of ML'98. 

18. G. Peterson and M. Stickel. Complete sets of reductions for some equational 
theories. J. of the ACM, 28(2):233-264, 1981. 

19. K. Rao. Completeness of hierarchical combinations of term rewriting systems. In 
Proc. of FSTTCS'93, LNCS 761. 

20. R. Rioboo, D. Doligez, T. Hardin, and all. FoCal Reference Manual, version 0.3.1. 
Universite Paris 6, CNAM & INRIA, 2005. http://focal.inria.fr/. 

21. S. Thompson. Laws in Miranda. In Proc. of LFP'86. 

22. S. Thompson. Lawful functions and program verification in Miranda. Science of 
Computer Programming, 13(2-3):181-218, 1990. 

23. P. Wadler. Views: a way for pattern matching to cohabit with data abstraction. 
In Proc. of POPL '81. 

24. P. Weis. Private constructors in OCaml. http://alan.petitepomme.net/cwn/ 
2003.07.01.html#5, 2003. 



